Quantitative Analysis in Political Science
Examine R Packages
Importing data from different sources
R is an open-source programming language, meaning that users can contribute packages that make our lives easier, and we can use them for free. For today, and the future, we will use many R packages including:
The suite of tidyverse packages: for data wrangling and data visualization
gapminder: for easy access to an excerpt of the Gapminder data on life expectancy, GDP per capita, and population by country
library function. Run the following lines in your console.You only need to install packages once, but you need to load them each time you relaunch RStudio.
The Tidyverse packages share common philosophies and are designed to work together. You can find more about the packages in the tidyverse at https://www.tidyverse.org.
To get started, run the following command to load the data and save it to a local version called gm .
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ℹ 1,694 more rows
However, printing the whole dataset in the console is not that useful.
One advantage of RStudio is that it comes with a built-in data viewer. Click on the name gm in the Environment pane (upper right window) that lists the objects in your environment. This will bring up an alternative display of the data set in the Data Viewer (upper left window). You can close the data viewer by clicking on the x in the upper left hand corner.
What you should see are seven columns of numbers, each row representing a different combination of country and year: the first entry in each row is simply the row number (an index we can use to access the data from individual years if we want), the second is the country, followed by the continent in which the country is located, year, and the last three columns represent the life expectancy, population, and gross domestic product (GDP) per capita for that country in the given year, respectively. Use the scroll bar on the right side of the console window to examine the complete data set.
Note that the row numbers in the first column are not part of the data. R adds them as part of its printout to help you make visual comparisons. You can think of them as the index that you see on the left side of a spreadsheet. In fact, the comparison to a spreadsheet will generally be helpful. R has stored this data set in a kind of spreadsheet or table called a data frame.
You can see the dimensions of this data frame as well as the names of the variables and the first few observations by typing:
Rows: 1,704
Columns: 6
$ country <fct> "Afghanistan", "Afghanistan", "Afghanistan", "Afghanistan", …
$ continent <fct> Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, Asia, …
$ year <int> 1952, 1957, 1962, 1967, 1972, 1977, 1982, 1987, 1992, 1997, …
$ lifeExp <dbl> 28.801, 30.332, 31.997, 34.020, 36.088, 38.438, 39.854, 40.8…
$ pop <int> 8425333, 9240934, 10267083, 11537966, 13079460, 14880372, 12…
$ gdpPercap <dbl> 779.4453, 820.8530, 853.1007, 836.1971, 739.9811, 786.1134, …
or
# A tibble: 6 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
It is better practice to type this command into your console, since it is not necessary code to include in your solution file.
We can see that there are 1704 observations and 6 variables in this dataset. The variable names are country, continent, year, lifeExp, pop, and gdpPercap.
1). What command would you use to extract just the country names?
We use the ggplot() function to build plots. If you run the plotting code in your console, you should see the plot appear under the Plots tab of the lower right panel of RStudio. Notice that the command above again looks like a function, this time with arguments separated by commas.
with ggplot()
aesthetic elements of the plot, e.g. the x and the y axes.+ to specify the geometric object for the plot. Since we want to scatterplot, we use geom_point()geom_point() with geom_line().ggplot function. Thankfully, R documents all of its functions extensively. To learn what a function does and its arguments that are available to you, just type in a question mark followed by the name of the function that you’re interested in. Try the following in your console: Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
1949 112 118 132 129 121 135 148 148 136 119 104 118
1950 115 126 141 135 125 149 170 170 158 133 114 140
1951 145 150 178 163 172 178 199 199 184 162 146 166
1952 171 180 193 181 183 218 230 242 209 191 172 194
1953 196 196 236 235 229 243 264 272 237 211 180 201
1954 204 188 235 227 234 264 302 293 259 229 203 229
1955 242 233 267 269 270 315 364 347 312 274 237 278
1956 284 277 317 313 318 374 413 405 355 306 271 306
1957 315 301 356 348 355 422 465 467 404 347 305 336
1958 340 318 362 348 363 435 491 505 404 359 310 337
1959 360 342 406 396 420 472 548 559 463 407 362 405
1960 417 391 419 461 472 535 622 606 508 461 390 432
# A tibble: 1,704 × 6
country continent year lifeExp pop gdpPercap
<fct> <fct> <int> <dbl> <int> <dbl>
1 Afghanistan Asia 1952 28.8 8425333 779.
2 Afghanistan Asia 1957 30.3 9240934 821.
3 Afghanistan Asia 1962 32.0 10267083 853.
4 Afghanistan Asia 1967 34.0 11537966 836.
5 Afghanistan Asia 1972 36.1 13079460 740.
6 Afghanistan Asia 1977 38.4 14880372 786.
7 Afghanistan Asia 1982 39.9 12881816 978.
8 Afghanistan Asia 1987 40.8 13867957 852.
9 Afghanistan Asia 1992 41.7 16317921 649.
10 Afghanistan Asia 1997 41.8 22227415 635.
# ℹ 1,694 more rows
As we did above, you can set the working directory by simply typing your location into the console. Additionally, you can “point and click” within RStudio to set the working directory.
Session -> Set Working Directory -> “Choose your option”
Files -> “Choose your file” -> More -> Set as Working Directory (In the lower right side window)
tidyverse to load the files